An Average - Reward Reinforcement Learning

ثبت نشده
چکیده

Recently, there has been growing interest in average-reward reinforcement learning (ARL), an undiscounted optimality framework that is applicable to many diierent control tasks. ARL seeks to compute gain-optimal control policies that maximize the expected payoo per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish between diierent policies that all reach an absorbing goal state, but incur varying costs. A more selective criterion is bias optimality, which can lter gain-optimal policies to select those that reach absorbing goals with the minimum cost. More generally, bias-optimal policies are gain-optimal policies that also maximize the average-adjusted sum of rewards in each state. While several ARL algorithms for computing gain-optimal policies have been proposed, none of these algorithms can guarantee bias optimality, since this requires solving at least two nested optimality equations. In this paper, we describe a novel model-based ARL algorithm for computing bias-optimal policies. We test the proposed algorithm using an admission control queuing system, and show that it is able to utilize the queue much more eeciently than a gain-optimal method by learning bias-optimal policies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results Editor: Leslie Kaelbling

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asyn-chronous algorithms from optimal co...

متن کامل

Manufactured in The Netherlands . Average Reward Reinforcement Learning : Foundations , Algorithms , and Empirical

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asyn-chronous algorithms from optimal co...

متن کامل

Tournament selection in zeroth-level classifier systems based on average reward reinforcement learning

As a genetics-based machine learning technique, zeroth-level classifier system (ZCS) is based on a discounted reward reinforcement learning algorithm, bucket-brigade algorithm, which optimizes the discounted total reward received by an agent but is not suitable for all multi-step problems, especially large-size ones. There are some undiscounted reinforcement learning methods available, such as ...

متن کامل

Adaptive aggregation for reinforcement learning in average reward Markov decision processes

We present an algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process. The algorithm is based on the reinforcement learning algorithm UCRL and uses confidence intervals for aggregating the state space. We derive bounds on the regret our algorithm suffers with respect to an optimal policy. These bounds are only slightly worse than the orig...

متن کامل

Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning

Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which ooers an elegant way of linking these two paradigms. Although sensitive discount optima...

متن کامل

Model-Based Average Reward Reinforcement Learning

Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Average-reward Reinforcement Learning meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996